Project Report

GGAP

Author

Alyssa Shou, Grace Chang, Grace Shao, and Paisley Lucier

Published

December 4, 2023

Abstract
This report analyzes Chicago crime data from the year 2022. Firstly, the variation of crime characteristics by hour—crime type, location, and violence level—are explored to address people’s concerns and stereotypes regarding peak crime hours. The findings yield recommendations for stakeholders to remain vigilant throughout the day, as public crime rates during daylight hours exhibit great similarity to those during nighttime. Next, theft is examined across the government-delineated community areas of Chicago. The region composed of The Loop, Near North Side, Near West Side, and West Town exhibits the highest levels of theft, thus stakeholders are urged to pay particular attention to their belongings in these areas. Next, community area crime distributions in general are further analyzed with respect to CTA stations. CTA riders are recommended to be especially cautious at the Roosevelt, Jackson, and 95th/Dan Ryan stations, which are discovered to be the most dangerous, featuring high rates of theft and physical crimes. Finally, the proportion of crimes resulting in arrests was assessed based on police districts, and the police are encouraged to allocate more resources to several districts exhibiting a large disparity in the proportion of crimes arrested, particularly for robbery in districts 5, 3, 6, and 12.

1 Background / Motivation

Chicago is often deemed one of the most dangerous cities in the United States, and people are typically concerned with this issue when visiting or living in the city. As the four of us are students at Northwestern—in close proximity to Chicago—we frequent the city ourselves, and have personally witnessed crimes committed. For example, on the Red Line, a common mode of transportation for many Chicago residents and tourists, there are many visible crimes that scare people away from using the subway. Therefore, we were interested in analyzing crimes in Chicago in order to see which types of crimes we should be more vigilant of, or where in Chicago we should practice extra caution. These analyses are crucial to our safety and the safety of others like us, who share similar experiences. Additionally, police inequality or inefficient allocation of resources in policing has been a prevalent issue in the United States recently, so we wanted to explore potential areas of disparity.

In a previous year, another group did a similar analysis on Chicago crime data for a different time frame. In the appendix, we speak to any overlap or differences we have with their project.

2 Problem statement

  1. Alyssa’s question looks at how types of crime is distributed throughout the day. At various hours of the day, it is expected that crime characteristics will vary. We wanted to know which types of crimes are most common in the day-time vs. night-time, where crimes are commonly committed, and whether or not they are violent.

  2. Grace Chang’s question examines theft, the most common type of crime, accounting for around 23% of total crimes, in further detail. We were interested in how theft was distributed across the various community areas of Chicago (e.g. The Loop), and how the population density of these areas relates to their respective theft rates.

  3. Grace Shao’s question explores the most dangerous stations and community areas to ride the CTA in. We wanted to know the typical profiles of CTA crimes, specifically, where they are committed, whether they are committed on the train or platform, and which time of day to avoid certain stations.

  4. Paisley’s question examines associations between the proportion of crimes that were arrested and the crime’s location in the city of Chicago, particularly in regards to police districts. How does the arrest proportion vary depending on what part of the city the crime occurs in?

3 Data sources

3.0.1 Primary Dataset:

To conduct our analysis, our primary dataset was Chicago crime data for 2022 as reported by the city of Chicago: https://data.cityofchicago.org/Public-Safety/Crimes-2022/9hwr-2zxp

The data reports 239,043 observations of crimes in the city of Chicago (at the time of download, as this data is being updated to this day), and information about the crime such as its location, time/date, and whether or not the crime resulted in arrest.

3.0.2 Supporting datasets include:

  • Chicago community areas by numeric code, population, area, and population density: https://en.wikipedia.org/wiki/Community_areas_in_Chicago
    • Since the original dataset includes the numeric code of the community areas, to make our analysis more usable and readable, we merged the two datasets to include community area names. Additionally, the population density of the community areas was used in the theft data analysis to create a regression.
  • IUCR codes https://data.cityofchicago.org/widgets/c7ck-438e
    • Used in Alyssa’s analysis solely as reference to find violent crime type IUCR codes but not actually merged with main dataset.
  • CTA stations coordinates: https://data.cityofchicago.org/Transportation/CTA-System-Information-List-of-L-Stops/8pix-ypme
    • The latitude and longitude columms are used to find the nearest subway station of each crime.
  • Police sentiment data via the city of Chicago: https://data.cityofchicago.org/Public-Safety/Police-Sentiment-Scores/28me-84fj/data
    • This dataset is a compilation of collected survey data about residents’ feeling towards police based on their responses to 4 questions: 1) rating the safety of their neighborhood, 2) rating how they feel the police in their neighborhood listen to concerns of local residents, 3) rating how well the police in their neighborhood treat local residents with respect, and 4) rating trust in their police. (Note: responses were scored on a scale of 1-10, and the data is compiled to multiply scores by 10 so an average rating of 60 in the dataset corresponds to an average score of 6)

4 Stakeholders

Our primary purpose is to help stakeholders understand crime in the city of Chicago. This understanding helps general parties make better choices to promote public and personal safety.

  • Chicago residents/visitors: Residents and visitors will benefit from our analysis by using our recommendations to more safely navigate the city and transit stations and make housing decisions.

  • Police force: For the police force, we hope our analysis can give them direction on how to better serve and satisfy communities across districts and determine where to focus resources to create a safer Chicago.

5 Data quality check / cleaning / preparation

5.1 Continuous variables: Main and Supporting Datasets

Year Latitude Longitude Hour Population Density Safety Score Trust Score Respect Score Listening Score
count 239043.0 234936.000000 234936.000000 239043.000000 54845.000000 1164.000000 1164.000000 1164.000000 1164.000000
mean 2022.0 41.845596 -87.668599 12.317633 6984.499435 57.442328 57.462148 58.680120 56.244296
std 0.0 0.088833 0.061009 6.985090 3602.179700 5.389730 6.707499 7.096675 6.634690
min 2022.0 36.619446 -91.686566 0.000000 388.360000 33.980000 38.040000 39.310000 35.040000
25% 2022.0 41.769150 -87.710149 7.000000 4405.730000 54.137500 52.427500 53.540000 51.525000
50% 2022.0 41.862981 -87.661469 13.000000 6226.000000 57.470000 57.410000 58.420000 56.360000
75% 2022.0 41.909017 -87.626402 18.000000 9516.370000 61.380000 61.890000 63.667500 60.620000
max 2022.0 42.022548 -87.524532 23.000000 14863.580000 71.110000 77.100000 78.850000 75.630000

5.2 Main Dataset Categorical Variables

Name of Variable Unique Values Missing Values Most Common Value Second Most Common Third Most Common
0 IUCR 304 0 Level: 0810, Count: 20096 Level: 0820, Count: 18863 Level: 0486, Count: 18679
1 Description 284 0 Level: SIMPLE, Count: 27207 Level: OVER $500, Count: 20096 Level: $500 AND UNDER, Count: 18863
2 Location Description 134 881 Level: STREET, Count: 67630 Level: APARTMENT, Count: 45596 Level: RESIDENCE, Count: 30470
3 Arrest 2 0 Level: False, Count: 211218 Level: True, Count: 27825
4 District 23 0 Level: 8, Count: 14811 Level: 6, Count: 14709 Level: 12, Count: 14353
5 Community Area 77 0 Level: 25, Count: 12251 Level: 8, Count: 10608 Level: 28, Count: 9496
6 Community Name (From theft data) 77 0 Level: 0, Count: 4251 Level: 1, Count: 3464 Level: 2, Count: 3373

5.3 Individual Data Preparation & Cleaning

Alyssa: When preparing my data, I dropped the 881 missing values in Location Description. One of the key points of my analysis is checking to see if a crime is being committed in a public or private setting. Because of this, I could not use any observations where the location was not reported. I did not drop any other observations before I began using the dataset. Throughout my analysis, I sliced the data so that each time-frame I analyzed had their own dataframe and I used those new dataframes to continue my EDA.

Grace Chang: When initially using the data, I had to subset the data such that only the observations with the Primary Type listed as “Theft” remained. For a more thorough analysis, considering that I did not use latitude or longitude in my analysis, I did not drop any missing values—most missing values were located in those two columns. I merged this dataset with one including information on the Community Area names, such that I could pair up the numeric codes of the Community Areas as listed in the raw dataset with their popularly known names.
In order to accomplish this, I had to clean the raw theft data so that the numeric codes were formatted in the same fashion as in the community area dataset. Upon retrieving the final, merged dataset, I also cleaned the observations so that the formatting of the community names were consistent.

Grace Shao: I began by subsetting the data to only include crimes occuring in the CTA station or in a CTA train. While I did not have to change any of the values in the dataset, I did remove the observations with NA values in the longitude or latitude. I needed to remove these observations to map each observation later in the analysis. Only around 1% of my data had NA values, so the removal did not have significant repercussions to my analysis. Additionally, I added the names of the community areas in order to enhance readibility and make it easier to make my graphs. The original code only included the numeric code of the community area for each observation, so I merged my data with a Wikipedia table that included matched numeric codes with community area names.

Paisley: For my analysis concerning arrests and police districts, I did not consider the 14 observations in district 31, as this district is split with area in both the North and South side and had so few observations (less that 1% of the data). Additionally, I added a new column for my analysis that binned the data by side based on its district referencing this source: https://news.wttw.com/sites/default/files/Map%20of%20Chicago%20Police%20Districts%20and%20Beats.pdf. For much of my analysis I only considered the top 10 crimes to ensure that one district with very few observations did not skew the data (note: in subsetting the top 10 crimes, I ommitted the crime type ‘other offense’ due to range of crimes within it). Lastly, for the police sentiments data that I worked with, I subsetted only the survey scores that were recorded in 2022 to match the crime dataset. Since the scores for safety, trust, respect, and listen were all very highly correlated with one another (correlation coefficients all > .9), I aggregated these scores by taking their mean in each district.

6 Exploratory Data Analysis

6.1 Analysis 1

By <Alyssa Shou>

For my question on how crime type varies throughout the day, I started by graphing a full distribution of the number of crimes per hour. Based on this line graph, I saw that there are peaks at 12 am and 12 pm, so I used these two hours as time frames to specifically analyze.

I was also interested in analyzing rush hour time frames because there are more people commuting during those times and thus a higher potential for crime. Morning rush hour is considered 6-9 am and evening rush hour is considered 4-7 pm.

For each time frame, I have graphed the top 10 types of crimes and the top 10 locations at which crimes occur. These graphs are shown below.

After finding the top 10 types of crime at each time, we can observe that the top crimes are very similar throughout the day. On that front, I did not see much variation, but there are some minor differences.

  • Midnight is the only time where retail theft is not in the top 10
  • Forced burglary is a top 10 crime at morning rush
  • Theft from building is a top 10 crime during evening rush hour
  • Comparatively, vehicle related crimes (ie. damage & theft) are less common at noon than other times

Finding the top 10 crimes at each time was the first thing I tried since it seemed like the most logical first step in the analysis. I did not anticipate or run into any problems at this step. We can see that even the differences above are not that significant and 9/10 crimes between each time frame overlap. Each of these differences can be explained with common sense logic. For example, forced burglary may be more common during morning rush hour because burglars are aware that houses will be empty as people leave to commute to work.

Public vs. Private Crime Rates
After looking at the top 10 types of crimes and observing no egregious variation, my next step was to analyze crime locations. The top three locations are the same for every time frame: Street, Apartment, Residence. After those three locations, the rest of the locations are clearly less common. I then took the total number of crimes committed among the top 10 crimes and found the proportion of them that were committed in public areas such as street, sidewalk, public garage. etc.

  • Morning Rush: 49.6%
  • Noon: 45.9%
  • Evening Rush: 62.9%
  • Midnight: 46.4%

Crimes are the most public during evening rush hour, but about the same degree of publicity in the other three time frames. I believe exploring crime location is important since these are general locations that stakeholders should avoid during specific times. Publicity rate is a successful way to measure safety because public crimes lend themselves to being the most common crimes and are often impulsive actions such as assault/battery and theft.

At first, I tried to keep all the time frames in a side-by-side barplot to keep the visualization cleaner and more brief. I quickly realized that this would not work because the top 10 locations for crimes at different times throughout the day would not be the same. Therefore, there would be parts of the graph with empty or very low bars for locations that are not included in the top 10 locations for all times. This would be misleading to stakeholders so I chose to slice the data and create separate bar graphs for clarity and organization.

Rate of Violent Crimes
Stakeholders are often concerned about crime violence. A common stereotype of Chicago is that we are a very dangerous city and many people unconsciously think of violent crimes when they think of crime in general. Because of this, I created a new column in the dataset that labeled each observation as violent or non-violent (see code file). Violent crimes might include: armed robbery, aggravated domestic battery, homicide first degree murder. Non-violent crimes include: theft pocket-picking, forgery, unlawful entry/trespassing, etc. Below are the percent of crimes in each time frame that are labeled as violent according to Illinois Uniform Crime Report (IUCR) codes. Then I found the percentage of reported crime in each time frame that was labeled violent.

  • Morning Rush: 23.69%
  • Noon: 18.88%
  • Evening Rush: 23.93%
  • Midnight: 21.88%

Unfortunately these rates are not negligible and crime does get violent in about 1/5 of data across all times of day. In the recommendations, I will provide action items for stakeholders based off this analysis.

Note: the previous year’s project group also analyzed crime at various hours of day. See Appendix A2 for more information

6.2 Analysis 2

By <Grace Chang>

Since I am conducting research on theft, I subset the data so that only data with the Primary Type “Theft” remained; therefore, I could perform my subsequent analyses on only the theft data. I firstly wanted to see how the types of theft varied across the seventy-seven government-delineated Chicago community areas. In order to perform this analysis, I looked for the top twelve community areas with the highest number of observations of theft crimes, and subset the data such that only the data for these twelve areas remained. I focused on only these twelve observations because I wanted to visualize for stakeholders which areas they should pay particular attention to in terms of how often theft crimes are observed there, and how types of theft crimes compare across those twelve areas.
In order to visualize these statistics, I decided on using a stacked bar-plot. Originally, I tried to use a series of line plots—one plot for each community area in the top twelve, featuring the number of thefts of a given type of theft during each month of the year—via the Seaborn FacetGrid method. This did not work as there was a drastic difference between the number of occurrences in certain categories of theft for some community areas during most of the months (likely due to an insufficient amount of observations in several theft type-month groups). As a result, the scale of the plots within the FacetGrid, while they were consistent, did not match up well with the scale of the lines plotted, and the visualization was difficult to view and interpret. The stacked bar-plot removed the month factor, but I realized by attempting the FacetGrid that the month did not matter much. The stacked bar-plot makes it easy to see the proportions of the various theft types and visually compares the frequencies for the different community areas.

Based on this plot, I noticed that theft over 500 dollars dominates theft under 500 dollars in most of the districts, with a notable difference in the ‘West Town’ and ‘Near West Side’ areas, showing that theft of greater financial value is more common than that of less value—this is further supported by the percentages of each type of overall theft, where theft over 500 dollars accounts for 33%, and theft under 500 dollars around 28%. While this is not a significant difference, I concluded that financial theft in general is by far the most common type of theft. The plot also exhibits this trend, where theft of monetary assets account for the greatest proportion of thefts, and retail theft also being common. One surprising observation from this plot is that pick-pocketing only makes up a small part of thefts across the twelve community areas compared to retail theft and financial theft.
Finally, I observed that the four areas with the most theft occurrences were concentrated in the same area—The Loop, Near North Side, Near West Side, and West Town border each other, and this region is also often described as downtown Chicago by visitors and residents. This observation implies that people should be particularly wary in these areas, especially seeing as they are popular areas to live and visit. Based on this context, I further questioned whether there was a relationship between the population density of a community area and the number of thefts in that area. In order to attack this problem, I created a dataset via merging that included the community area names, their corresponding numbers of thefts, and their population densities. I then utilized this dataset to plot a linear regression relating population density and number of thefts.

Community Areas Ranked by Population Density
     Community Name  Density (sqkm)
0   Near North Side        14863.58
4         Lake View        12752.44
15        Edgewater        12491.89
9       Rogers Park        11672.81
1          The Loop         9897.73
38      Albany Park         9732.13
10           Uptown         9516.37
6      Lincoln Park         8612.96
12       West Ridge         8435.36
55          Hermosa         7940.46
20   Belmont Cragin         7713.71
7      Logan Square         7707.48

It is important to see that while there is a general positive correlation, out of the top four community areas of interest that I discovered as exhibiting the highest numbers of theft, only two of them fall under the top twelve most population dense areas. I formulated this observation by comparing the top four with the top twelve because the top four community areas of interest were originally extracted from the top twelve community areas with the most theft overall. This observation implies that there are likely many extraneous variables in play affecting the number of thefts aside from simply the population density of an area, but based on the trend, we can assume that population density is still a significant independent variable. Seeing as the Near North Side (rank 1 in population density) and The Loop (rank 5 in population density) areas are much more dense than the West Town and Near West Side areas, but border these two areas to the right, a potential explanation could also be that there are many people who travel between these areas, and with them comes a spill-over of theft crimes into these two less population dense areas.

6.3 Analysis 3

By <Grace Shao>

I began by investigating the location of CTA crimes. In order for Chicagoans to know which locations they should avoid, the community area with the most crime is important information. I found the 10 community areas with the highest number of crimes, and graphed them below in descending order. The Loop has the most crime by far, outnumbering other community areas by a significant amount. Compared to the #2 most dangerous station, The Loop still had more than 4x the amount of crime.

This graph matches the map of CTA crimes as shown below. I wanted to create this map to visualize where crimes were happening and highlight that The Loop had a very high number of crimes, shown with the high density of points in that area. Since each community area has a different color, it also helps visualize how crimes are spread out across different areas. With this question, I did not anticipate that I would have to make many changes to make the map more readable. I changed the color scale to correspond with the community area, decreased opacity, and increased the zoom to focus on The Loop. This process took a lot of trial and error, especially since plotly was a new library that I had never used before.

Since The Loop represented such a large portion of the crimes committed and is an extremely popular area of Chicago (The Bean, Art Institute, and River Walk are all located there), I wanted to do further analysis on it. Subsetting the data to include only The Loop, I found the most common crimes occurring on CTA stations within that community area. For the top 6 crime types, I found that 3 were theft related and 3 were more physical and violent. Simple battery was the most common crime overall, while pickpocketing was the second most common crime.

I thought it was interesting that theft related crimes were much more likely to happen on the train. However, when looking at physical crimes, a significant proportion happened on the platform. Compared to theft, a higher proportion happened on the platform.

Now that I had established The Loop as the most dangerous community area, I wanted to also pinpoint the most dangerous stations. During my data analysis, I ran into a problem. The dataset did not include which station the crime was committed at – only the longitude and latitude. I had anticipated that the crimes might be clustered and easy to identify on a map, but I quickly found that in areas with many stations it was difficult to visually identify which station the crime belonged to, and would be a very slow process to do manually. I decided instead to import a list of the CTA stations and their coordinates.

For each crime, I found the closest station by longitude and latitude using the Haversine formula, which is recommended for coordinates calculations because it simulates distance on a sphere\(^{1}\). I then found and created a graph of the top 3 most dangerous stations. While there are a few outliers (one CTA crime shows up in a different state) I left them in the map to accurately portray all of the crimes reported. I thought that calculating the nearest station would be a successful approach because it automatically distinguishes between station locations in an efficient way.

  1. “Haversine Formula to Find Distance between Two Points on a Sphere.” GeeksforGeeks, GeeksforGeeks, 5 Sept. 2022, www.geeksforgeeks.org/haversine-formula-to-find-distance-between-two-points-on-a-sphere/.

The map below shows the mapped out crimes for each of the top 3 most dangerous stations. Two of these stations existed in The Loop, and the other was more South. All three were Red line stations.

While knowing the top 3 most dangerous stations is important, I also wanted to know the time of day where most crimes occur. This gives people more information in case they are traveling through these stations. To find this information and graph it, I singled out the hour for each crime that happened in the top 3 stations and created a kde density plot to easily identify the peaks in crime. I found that crimes happen most just after midnight around 2 AM and during rush hour. This trend held true for all 3 stations.

Lastly, I wanted to explore just how much more dangerous Roosevelt, the most crime ridden station, was than the average station. In order to find this number, I found the number of crimes within Roosevelt station and compared it to the average number of crimes per station. This would be important for my recommendations to stakeholders to illustrate why safety is important around The Loop.

The Roosevelt station has 7.316203895565685 times more crime than the average station.

6.4 Analysis 4

By <Paisley Lucier>

For my analysis I explored the associations between proportion of committed crimes that resulted in arrest and the crime loaction. Since arrest proportion is a critical metric for policing, I used police district to describe location to ensure applicability for stakeholders. Particularly, I also wanted to consider a police district’s sentiment score rating to see if there were associations between a district’s sentiment score and their arrest proportion. As ‘Arrest’ is a boolean value, I had to find representative and effective ways to bin the data, before landing on binning police districts by side of Chicago. Ultimately, within my analysis I both look at general trends in location by considering the ‘side’ of districts, which still maintaining the specificity of district for other analyses in order to offer recommendations for police in specific districts.

Firstly, I generally looked at the proportion of observations that resulted in arrest by each side of Chicago, as well as the average sentiment score by side.

This bar graph on the left shows the proportion of arrested crime across side. The North side has the highest overall arrest proportion–higher than both the South and Central sides, which have very similar proportions. On the right, we can see that the North side also has the highest average police sentiment score. However, the Central side is very close in average score, and the South side’s average score is notably below the other two.

Considering differences in arrest rate for different crime types, I next looked at if the proportion of people arrested for the same primary type of crime differs across side.

The barplot above shows the proportion of observations arrested for each of the 10 crimes with the most overall observances in the data, separated by the side the crime was committed in. Within this bar chart, we can see that the disparities in arrest proportion across sides prevail, though smaller in magnitude. This graph shows us that of the top 10 most frequently recorded crimes, the North Side’s arrest proportion is higher for 8 of them. Additionally, we can see that some crimes have much higher arrest proportions across all sides than others: narcotics and weapons violations have higher arrest proportions than other crimes.

I next considered the association between arrest proportion and average sentiment score for each district for the top 10 crimes. The 5 crimes of Robbery (0.658), Battery (0.598), Theft (0.495), Assault (0.459), and Burglary (0.442) had the highest correlation between a district’s average sentiment score rating and their arrest rate (with the next highest having a correlation coefficient of 0.26). Considering comparable qualities, including arrest rate and trendline (see appendix A.1), I binned these five crimes with the highest correlations by general type (physical assualt and theft) and visualized them below.

The visualization above portrays that for the crime types that are assault and battery, as well as the crime types of burglary, robbery, and theft, there is a moderate, positive, and linear association between the proportion arrested in a district and average sentiment for a district. This means that in general, districts with higher sentiment scores tend to have higher arrest proportions for the types of crimes included above.

Lastly, I wanted to look at which crime-district combinations had the highest disparity in arrest proportion. Below is a dataframe displaying the top 10 most observed crime types and their districts with the highest/lowest arrest proportion, as well as the difference between the highest and lowest proportion.

District of Max Arrest Proportion District of Min Arrest Proportion Difference
Primary Type
Weapons Violation 18.0 17.0 0.516396
Robbery 16.0 5.0 0.134063
Narcotics 19.0 6.0 0.116667
Assault 1.0 12.0 0.085086
Burglary 18.0 4.0 0.083151
Battery 1.0 7.0 0.078671
Theft 1.0 7.0 0.072987
Criminal Damage 16.0 2.0 0.036573
Deceptive Practice 22.0 25.0 0.028950
Motor Vehicle Theft 20.0 9.0 0.028729

From the dataframe above (sorted by the difference in arrest proportion), we can see that, of the top 10 crimes, weapons violation has the highest arrest disparity, followed by robbery and narcotics. These top 3 all have a difference of over 10%. Additionally, 5 out of 10 of the districts with the minimum arrest proportions are on the South side of Chicago.

7 Conclusions & Recommendations

Our individual analyses answer the broader topic of how to promote personal and community safety and welfare within Chicago. This plays into people’s satisfaction with policing and how to improve these sentiments, along with suggestions on how people should look out for themselves when traveling or living in the city. When examining the various trends yielded by our analyses, it is clear that across theft, general crime, and CTA crime that rush hour and midnight are the most dangerous times. Additionally, theft is very common across Chicago, whether it be on the street, in residential homes, or transportation areas, so stakeholders should be vigilant of our possessions, and can feel less anxious about murder, for example, which only makes up 0.3% of total crimes.

Alyssa’s recommendation
Looking overall at the types of crimes committed during the day, the main takeaway for vistors and residents of Chicago is that crime during the day is just as rampant as crime at night. Speaking on the crimes being committed, our analysis shows that the top 10 most common crimes are consistent from morning rush hour til midnight. In addition, the top 3 locations are identical for all time frames. During evening rush be especially careful because that is when the degree of public crime is highest. Note that crimes are just as public in the morning/noon as they are at midnight so do not make the assumption that you are safer when in broad daylight surrounded by many people. In addition, the analysis shows that crime is just as likely to be violent at any hour of the day. Some people like to carry personal safety equipment like pepper spray when going out at night. This equipment is just as crucial during daytime hours as it is after-dark so I advise stakeholders to take that into consideration.

Stakeholders should keep in mind that I did not analyze every hour of the day. I am using the 8 hours that I did analyze to generalize recommendations that I believe will hold up for the other 16 hours that are not analyzed. The data we analyzed is fairly recent so stakeholders do not need to repeat my analysis. However, based on their occupation, most common commuting routes, and/or most-frequented locations, stakeholders should do extra research for their personal safety in those areas if it may differ from the analysis I’ve presented. Please also take a look at Grace Shao’s CTA analysis if you commute via L train often.

Grace Chang’s recommendation
Next, based on the analysis of theft crimes, it is recommended to stakeholders—anyone who frequents or resides in Chicago—that they should pay more attention to their personal belongings in the region consisting of The Loop, Near North Side, Near West Side, and West Town. This region is popular for travel, as it includes financial districts and tourist attractions such as the Magnificent Mile, the Bean, and more, thus there are many stakeholders who are affected by this result. Furthermore, seeing that 33% of all thefts are thefts of financial assets over 500 dollars, and 28% are thefts of under 500 dollars, it is essential to be attentive about one’s financial possessions. Meanwhile, pick-pocketing, for example, only represents a small percentage—5.16%—of total theft crimes, so stakeholders can be assured that this crime is less common, contrary to common assumptions that pick-pocketing is a heavy concern when it comes to theft.
There are a few limitations that stakeholders should keep in mind: This analysis does not include motor vehicle theft, another common type of theft, because motor vehicle theft has its own subsets of theft types that clash with the general theft category or overwhelm it, such that it became difficult to perform deeper analysis on the general theft category. Additionally, within these community areas there are neighborhoods that can vary in crime rates, but these go beyond the scope of our research and dataset, so stakeholders should do further analysis on the specific neighborhood(s) they are visiting.

Grace Shao’s recommendation
On the CTA, it is clear that the Loop has the highest amount of crime by far, with more than 4x as much crime as the next most dangerous community area. Therefore, in The Loop especially, it is important to stay alert. As for what types of crimes to look out for in this area, pickpocketing and simple assault are the most likely. Theft and is much more likely to occur on the train than the platform, so it is more important watch your belongings closely and keep valuables out of sight on the train. Compared to theft, physical or violent crimes have a higher chance of happening on the platform. Therefore, avoid making contact with others on the platform and leave space between you and others. Since the Loop is a popular tourist area, with landmarks such as the river walk, Art Institute, and Cloud Gate, many stakeholders may be traveling there and it is important to stay alert.

As for specific stations, avoid Roosevelt, 95th/Dan Ryan, and Jackson when possible, especially around midnight and 6-7 pm, when crime rate peaks. To put it in perspective, chances of crime on Roosevelt, the station with the most crime, are 7.32x higher than the average station. By following these recommendations, stakeholders can stay safe while traveling in the city.

Paisley’s recommendation
In regards to the police stakeholders, police should allocate resources, as well as more research into demographic information and district needs to pinpoint the roots of the disparities in arrest rates across districts for the same type of crime–namely the crimes of weapons violations in districts 17 and 18, robbery in districts 5 and 16, and narcotics in districts 6 and 19, which all have arrest proportion disparities of >10% across the named districts.

Particularly, as seen in the associations between a district’s arrest proportion and its sentiment rating, robbery has the highest correlation between a district’s robbery arrest proportion and the district’s police sentiment score, and also is in the top 3 crimes (of the top 10) with the highest arrest disparity. Thus, police should allocate resources to prevention of robbery in district 5, as well as further consider their arrest tactics and get community input to aim for higher sentiment scores. (Note: District 5’s lowest robbery arrest proportion is followed by districts 3, 6, and 12, so this recommendation extends to these districts).

Appendix

7.0.1 A.1 - Analysis 4 reference: Associations between a district’s arrest proportion for a type of crime and the district’s sentiment score, top 10 crimes

7.0.2 A.2 - Comparison to previous project on Chicago crime

Previous year’s analysis on the types of crime throughout the day focused on specific community areas that Northwestern students frequently visit. They also found a peak in number of crimes at 12 am and 12 pm. Our analysis is more general and analyzed crimes in all areas at peak times.

Previous year’s CTA findings: While the previous year did not analyze all of the stations, they did single out Howard station as one of interest, given that students used it frequently. They found that crime on Howard peaked just after midnight and around rush hour, which matches with the analysis I did on the top 3 most dangerous stations. In those stations, the most dangerous times were also around 1-2 AM and rush hour.